Search Result

Select

Recommendation method based on k nearest neighbors using data dimensionality reduction and exact Euclidean locality-sensitive hashing

GUO Yudong, GUO Zhigang, CHEN Gang, WEI Han

Journal of Computer Applications 2017, 37 (9): 2665-2670. DOI: 10.11772/j.issn.1001-9081.2017.09.2665

Abstract （519）

PDF （1114KB）（441）

Save

There are several problems in the recommendation method based on k nearest neighbors, such as high dimensionality of rating features, slow speed of searching nearest neighbors and cold start problem of ratings. To solve these problems, a recommendation method based on k nearest neighbors using data dimensionality reduction and Exact Euclidean Locality-Sensitive Hashing (E ²LSH) was proposed. Firstly, the rating data, the user attribute data and the item category data were integrated as the input data to train the Stack Denoising Auto-encoder (SDA) neutral network, of which the last hidden layer values were used as the feature coding of the input data to complete data dimensionality reduction. Then, the index of the reduced dimension data was built by the Exact Euclidean Local-Sensitive Hash algorithm, and the target users or the target items were retrieved to get their similar nearest neighbors. Finally, the similarities between the target and the neighbors were calculated, and the target user's similarity-weighted prediction rating for the target item was obtained. The experimental results on standard data sets show that the mean square error of the proposed method is reduced by an average of about 7.2% compared with the recommendation method based on Locality-Sensitive Hashing (LSH-ICF), and the average run time of the proposed method is the same as LSH-ICF. It shows that the proposed method alleviates the rating cold start problem on the premiss of keeping the efficiency of LSH-ICF.

Reference | Related Articles | Metrics

Select

Fine-grained sentiment analysis oriented to product comment

LIU Li, WANG Yongheng, WEI Hang

Journal of Computer Applications 2015, 35 (12): 3481-3486. DOI: 10.11772/j.issn.1001-9081.2015.12.3481

Abstract （858）

PDF （1058KB）（827）

Save

The traditional sentiment analysis is coarse-grained and ignores the comment targets, the existing fine-grained sentiment analysis ignores multi-target and multi-opinion sentences. In order to solve these problems, a method of fine-grained sentiment analysis based on Conditional Random Field (CRF) and syntax tree pruning was proposed. A parallel tri-training method based on MapReduce was used to label corpus autonomously. CRF model of integrating various features was used to extract positive/negative opinions and the target of opinions from comment sentences. To deal with the multi-target and multi-opinion sentences, syntax tree pruning was employed through building domain ontology and syntactic path library to eliminate the irrelevant target of opinions and extract the correct appraisal expressions. Finally, a visual product attribute report was generated. After syntax tree pruning, the accuracy of the proposed method on sentiment elements and appraisal expression can reach 89% approximately.The experimental results on two product domains of mobile phone and camera show that the proposed method outperforms the traditional methods on both sentiment analysis accuracy and training performance.

Reference | Related Articles | Metrics

Select

MapReduce Based Image Classification Approach

WEI Han ZHANG Xueqing CHEN Yang

Journal of Computer Applications 2014, 34 (6): 1600-1603. DOI: 10.11772/j.issn.1001-9081.2014.06.1600

Abstract （249）

PDF （642KB）（431）

Save

Many existing image classification algorithms cannot be used for big image data. A new approach was proposed to accelerate big image classification based on MapReduce. The whole image classification process was reconstructed to fit the MapReduce programming model. First, the Scale Invariant Feature Transform (SIFT) feature was extracted by MapReduce, then it was converted to sparse vector using sparse coding to get the sparse feature of the image. The MapReduce was also used to distributed training of random forest, and on the basis of it, the big image classification was achieved parallel. The MapReduce based algorithm was evaluated on a Hadoop cluster. The experimental results show that the proposed approach can classify images simultaneously on Hadoop cluster with a good speedup rate.

Reference | Related Articles | Metrics